Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically set EncodingType to url #726

Merged
merged 2 commits into from
Nov 30, 2015

Conversation

JordonPhillips
Copy link
Contributor

Some unicode values can break the XML parser, so we have s3 url encode the keys and decode them on our end.

# This is needed because we are passing url as the encoding type. Since the
# paginator is based on the key, we need to handle it before it can be
# round tripped.
if 'Contents' in parsed and parsed.get('EncodingType') == 'url':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should only automatically decode the URI encoded values if the EncodingType was not explicitly set by the user? For example, if a user actually looks at the S3 API and sees that you can set and EncodingType, then they might reasonably expect that the results they receive are encoded... In fact, that's probably the behavior they are expecting right now. If we automatically start decoding these, it would be a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not really any way to know that though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we'd have to add support for that in order to make it backwards compatible.

@JordonPhillips JordonPhillips added the pr/needs-review This PR needs a review from a member of the team. label Nov 24, 2015
@@ -176,12 +176,12 @@ def test_can_delete_urlencoded_object(self):
bucket_contents = self.client.list_objects(
Bucket=self.bucket_name)['Contents']
self.assertEqual(len(bucket_contents), 1)
self.assertEqual(bucket_contents[0]['Key'], 'a+b/foo')
self.assertEqual(bucket_contents[0]['Key'], u'a+b/foo')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to add the unicode prefix to these? I feel like these should still pass with the unicode prefixed removed.

@kyleknap
Copy link
Contributor

Implementation looks fine. It would be nice to have an integration test for this by adding a key that would normally break the xml parser and show how it is now fixed with autopopulation of encoding type and also show that it is still url encoded (not automatically decoded) if the user specified it.

self.create_object(key_name)
parsed = self.client.list_objects(Bucket=self.bucket_name)
self.assertEqual(len(parsed['Contents']), 1)
self.assertEqual(parsed['Contents'][0]['Key'], key_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also do a second call where we specify EncodingType=url and show that it is not being url decoded? You could just add it to this test.

@kyleknap
Copy link
Contributor

Just one more thing. It should be good to merge. Also make sure you have a PR for the CLI to update it so we can merge this and the corresponding CLI PR to avoid test failures when we just merge this PR.

@JordonPhillips
Copy link
Contributor Author

👍

@kyleknap
Copy link
Contributor

Thanks 🚢. We can merge this when the CLI PR is made to update the CLI based off of these changes.

@mtdowling
Copy link
Contributor

:shipit:

# paginator is based on the key, we need to handle it before it can be
# round tripped.
if 'Contents' in parsed and parsed.get('EncodingType') == 'url' and \
kwargs['context'].get('EncodingTypeAutoSet') == True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and context.get('EncodingTypeAutoSet', False): or just:

and context.get('EncodingTypeAutoSet')

@jamesls
Copy link
Member

jamesls commented Nov 25, 2015

:shipit: Just had a few small comments.

JordonPhillips added a commit that referenced this pull request Nov 30, 2015
Automatically set EncodingType to url
@JordonPhillips JordonPhillips merged commit 843a66c into boto:develop Nov 30, 2015
@JordonPhillips JordonPhillips deleted the s3_ls_encoding_type branch March 21, 2016 21:28
tipabu added a commit to tipabu/s3-tests that referenced this pull request Nov 25, 2019
Following boto/botocore#726, list_objects will
automatically include a encoding-type=url query param. However, the
client does not decode all of the response elements properly -- notably,
Prefix would remain encoded.

Hopefully this will be fixed soon-ish (I've got a patch proposed at
boto/botocore#1901) but in the meantime, use the
work-around suggested in boto/boto3#816 of
unregistering the set_list_objects_encoding_type_url handler.

Signed-off-by: Tim Burke <tim.burke@gmail.com>
tipabu added a commit to tipabu/s3-tests that referenced this pull request Apr 4, 2022
Following boto/botocore#726, list_objects will
automatically include a encoding-type=url query param. However, the
client does not decode all of the response elements properly -- notably,
Prefix would remain encoded.

Hopefully this will be fixed soon-ish (I've got a patch proposed at
boto/botocore#1901) but in the meantime, use the
work-around suggested in boto/boto3#816 of
unregistering the set_list_objects_encoding_type_url handler.

Signed-off-by: Tim Burke <tim.burke@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/needs-review This PR needs a review from a member of the team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants